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The folding dynamics of small single-domain proteins is a current focus of simulations and experiments. 
Many of these proteins are 'two-state folders', i.e. proteins that fold rather directly from the denatured state to 
the native state, without populating metastable intermediate states. A central question is how to characterize 
the instable, partially folded conformations of two-state proteins, in particular the rate-limiting transition-state 
conformations between the denatured and the native state. These partially folded conformations are short-lived 
and cannot be observed directly in experiments. However, experimental data from detailed mutational analyses 
of the folding dynamics provide indirect access to transition states. The interpretation of these data, in particular 
the reconstruction of transition-state conformations, requires simulation and modeling. The traditional interpre- 
tation of the mutational data aims to reconstruct the degree of structure formation of individual residues in the 
transition state, while a novel interpretation aims at degrees of structure formation of cooperative substructures 
such as a-helices and /3-hairpins. By splitting up mutation-induced free energy changes into secondary and 
tertiary structural components, the novel interpretation resolves some of the inconsistencies of the traditional 
interpretation. 



I. FOLDING DYNAMICS OF SMALL SINGLE-DOMAIN 
PROTEINS 

Proteins are biomolecules that participate in all cellular 
processes of living organisms. Some proteins have struc- 
tural or mechanical function, such as the protein collagen, 
which provides the structural support of our connective tis- 
sues. Other proteins catalyze biochemical reactions, trans- 
port or store electrons, ions, and small molecules, perform 
mechanical work in our muscles, transmit information within 
or between cells, act as antibodies in immune responses, or 
control the expression of genes and, thus, the generation of 
other proteins [1 1. Proteins achieve this functional versatility 
by folding into different, unique three-dimensional structures 
(see fig. [TJ. The folding of proteins is a spontaneous process 
of structure formation and a prerequisite for their robust func- 
tion. Misfolding can lead to protein aggregates that cause se- 
vere diseases, such as Alzheimer's, Parkinson's, or the variant 
Creutzfeldt-Jakob disease |2|. 

How precisely proteins fold into their native, three- 
dimensional structure remains an intriguing question ||3] HJ. 
Given the vast number of unfolded conformations of the flex- 
ible protein chain, Cyrus Levinthal argued in 1968 ||5]|6l that 
proteins are guided to their native structure by a sequence 
of folding intermediates. In the following decades, experi- 
mentalists focused on detecting and characterizing metastable 
folding intermediates of proteins |7|. The view that proteins 
have to fold in sequential pathways from intermediate to in- 
termediate, now known as 'old view' |8 9|, changed in the 
'90s when statistical-mechanical models demonstrated that 
fast and efficient folding can also be achieved on funnel en- 
ergy landscapes that are smoothly biased towards the native 
state 1 10, 11 1. The stochastic folding process on these land- 
scapes is highly parallel, and partially folded states along the 
parallel folding routes are instable rather than metastable. The 
paradigmatic proteins of this 'new view' are two-state pro- 
teins, first discovered in 1991 lfT2]| . Two-state proteins fold 
from the denatured state to the native state without experi- 



mentally detectable intermediate states. Since then, the ma- 
jority of small single-domain proteins with a length up to 100 
or 120 amino acids has been shown to fold in apparent two- 
state kinetics, while larger multi-domain proteins often exhibit 
metastable folding intermediates lfT3l[T4l[T5]| . 

The simplest model for a two-state process is classical 
transition-state theory. In transition-state theory, the folding 
rate of a two-state protein is assumed to have the form (see, 
e.g., LMJ) 

k = KeM-Gj-TilRT] (1) 

where Gjd is the free-energy difference between the transi- 
tion state T and the denatured state D (see fig.|2|a)), and ko is 
a prefactor that depends on the conformational diffusion coef- 
ficient of the protein. Classical transition-state theory thus as- 
sumes a third state, the transition state T, that governs the fold- 
ing kinetics. From a statistical-mechanical perspective of pro- 
tein folding, the transition state T, the denatured state D, and 




FIG. 1 : The structure of the protein CI2 consists of an a-helix packed 
against a four-stranded /3-sheet |80|. CI2 is a two-state protein that 
folds from the denatured state to the native state without experimen- 
tally detectable intermediate states \\2i . 
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the native state N are ensembles of conformations. The dena- 
tured state is a huge ensemble of largely unstructured protein 
conformations, while the folded, native state corresponds to a 
rather narrow ensemble that captures the thermal fluctuations 
in this state. The transition state can be defined as an ensem- 
ble of partially folded conformations with equal probability 
to fold or unfold fT6"T7"T8l. According to this definition, a 
trajectory that passes through a transition-state conformation 
thus has the same probability 0.5 to proceed to the native state 
or to the denatured state from this conformation. 

The folding times of small single-domain proteins range 
from microseconds to seconds lfT3l ITSl |T9l . An important 
observation was that these folding times correlate with the 
average Tocakiess' of contacts between amino acids in the 
folded state ||20ll2TI . A local contact is a contact between two 
amino acids that are close in sequence, for example a con- 
tact between two amino acids in adjacent turns of an a-helix. 
Proteins with predominantly local contacts, such as a-helical 
proteins, tend to fold faster than proteins with many nonlocal, 
sequence-distant contacts. The physical principle that under- 
lies this correlation between folding times and average local- 
ness of contacts seems to be loop closure [22, 23], since local 
contacts can be formed by fast closure of small loops Il24ll25l . 

Molecular dynamics (MD) simulations with detailed, atom- 
istic models of proteins have been used to study the dynamics 
of small, fast-folding proteins with folding times in the mi- 
crosecond range 1261 [221 HH |29l |30l |3]J . One of the best- 
studied proteins is the villin headpiece, an a-helical protein 
with 36 amino acids. Central questions are whether folding 
simulations with current force fields reach the correct, exper- 
imentally determined folded state of a protein from unfolded 
conformations, and whether the dynamics of folding events 
observed in these simulations agrees with experimental data. 
In case of the villin headpiece, MD simulations of several 
groups have reached the folded state of the protein 1261 [30l . 
whereas folding simulations of a fast-folding WW domain, a 
/3-sheet protein, have only reached structures with incorrect 
topology LliJ. 



II. MUTATIONAL ANALYSIS OF TWO-STATE PROTEIN 
FOLDING 

Since transition-state conformations of two-state proteins 
are instable and, thus, short-lived, they cannot be observed 
directly in experiments. The most important, indirect exper- 
imental method to investigate the folding dynamics of two- 
state proteins is mutational analysis lT4ll . In a mutational anal- 
ysis, a large number of mostly single-residue mutants of a pro- 
tein is generated, and the folding rate k and stability Gn d of 
each mutant is determined. The stability Gn-d of a protein 
is the free energy difference between native state N and the 
denatured state D. 

The effect of each mutation on the folding dynamics is typ- 
ically quantified by its <i>-value 1T41|32| 
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FIG. 2: (a) In classical transition-state theory, the folding kinetics of 
a two-state protein is dominated by a transition state T between the 
denatured state D and the native state N. The folding rate depends on 
the difference Gt d = Gt — Go between the free energy Gt of the 
transition state T and the free energy Go of the denatured state D, 
see eq. l|TJ. - (b) Mutations perturb the free energies of the denatured 
state, transition state, and native state. 



Here, k is the folding rate for the wildtype protein, k' is the 
folding rate for the mutant protein, and AGn d — Gn'-d' — 
Gn^d is the change of the protein stability induced by the mu- 
tation. Gn -d' and Gn d denote the stabilities of the mutant 
and the wildtype, see fig.|2|b). With eq. ([T]), <I>-values can be 
written in the form 



AGt-d 

AGm-D 



(3) 



RTln{k/k') 
A Gn-d 



(2) 



if one assumes that the pre-exponential factor ko is not af- 
fected by the mutation lT4l . Here, AGtd = Gt -d' — Gt-d is 
the mutation-induced change of the free-energy barrier Gj.£,, 
see fig.igb). 

In the past decade, the folding dynamics of several dozen 
two-state proteins has been investigated with mutational 
value analyses (for references, see, eg. i33\ ). An example 
of data from a mutational analysis of the protein CI2 |34| is 
shown in table 1 . The single-residue mutations of table 1 are 
all located in the a-helix of the protein CI2, which comprises 
the residues 12 to 24 of this protein (see fig. [T}. In the mu- 
tation S12G, for example, the amino acid 12 of the wildtype. 
Serine (single-letter code S) is replaced by the smaller amino 
acid Glycine (single-letter code G). The experimentally mea- 
sured (l>-value for this mutation is 0.29, and the experimentally 
measured change in stability AGat is 0.8 kcal/mol. 

The central question is if we can reconstruct the transition 
state of a two-state protein from the observed (^-values for a 
large number of mutants [i?] |35l |36l |37l IMl ■ In the standard 
interpretation of <I>-values, a <I>-value of I is interpreted to in- 
dicate that the residue has a native-like structure in T, since 
the mutation shifts the free energy of the transition state T by 
the same amount as the free energy of the native state N. A 
<I>-value of is interpreted to indicate that the residue is as 
unstructured in T as in the denatured state D, since the mu- 
tation does not shift the free-energy difference between these 
two states. <I>-values between and I are typically taken to 
indicate partially native-like structure in T lT4l [35 1. In the 
traditional interpretation, a <I>-value thus is taken to indicate 
the degree of structure formation of the mutated residue in the 
transition-state ensemble T. 
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TABLE I: Mutational data for the helix of the protein CI2 
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Experimental $-values and stability changes AGjv are from Itzhaki 
et al.fWl. The change in intrinsic helix stability AG a is calculated 
with AGADIR |77 , 78, 79|, see Merlo et al. |38|. The program 
AGADIR is based on helix/coil transition theory, with parameters 
fitted to data from Circular Dichroism (CD) spectroscopy. The 
free-energy changes are in units of kcal/mol. We only consider 
mutations with AGn > 0.7 kcal/mol, since <l?-values for mutations 
with smaller values AGn are often considered to be unreliable 
(35] [671 [69). 



However, the traditional interpretation is often not consis- 
tent. First, some <i>-values are negative or larger than 1 1391 140| 
and cannot be interpreted as a degree of structure formation. 
An example is the negative <i>-value —0.25 for the mutation 
D23A in the a-helix of CI2 (see table 1). Second, ^-values 
are sometimes significantly different for different mutations 
at a given chain position. The mutations E15D and E15N in 
the helix of the protein CI2, for example, have (f>-values of 
0.22 ± 0.05 and 0.53 ± 0.05 [34], which differ by more than 
a factor 2 (see table 1). Jn the traditional interpretation, how- 
ever, (f>-values for different mutations of the same residue are 
expected to be identical, since they just reflect the degree of 
structure formation of this residue in T. Third, <i>-values for 
neighboring residues within a given secondary structure often 
span a wide range of values. The <i>-values shown in table 1 
for mutations in the CI2 helix range from —0.25 to 1.06. Ac- 
cording to the traditional interpretation, this implies that some 
of the helical residues are unstructured in the transition state, 
while other residues, often direct neighbors, are highly struc- 
tured. The traditional interpretation thus seems to contradict 
the notion that secondary structures are cooperative. In stan- 
dard helix-coil models 11411 142] 143 1. the formation of helices 
requires that several consecutive helical turns are structured, 
stabilizing each other 

<i>-values provide indirect information on the folding ki- 
netics of a protein and, therefore, have attracted consid- 
erable theoretical interest. To understand the experimen- 
tally determined (f>-values for a protein, molecular dynam- 
ics (MD) simulations with atomistic models are often per- 



formed l4i|45]|4i|47l|48l|49l|50l|5l]|52l|53l Such sim- 
ulations are computationally demanding and in general do not 
allow direct calculations of folding rates and <i>-values. In- 
stead, the MD approaches typically rely on the assumption 
of the traditional interpretation that <I>-values reflect the de- 
gree of structure formation of residues in the transition state 
T. For example, <I>-values are often calculated from the frac- 
tion of contacts a residue forms in the transition state T, com- 
pared to the fraction of contacts in the native and the denatured 
states [Slll5l|46||42l|48l|49||54l. In an alternative approach, 
Daggett and coworkers compute an S-value ll50l , which is "a 
measure of the amount of structure at a given residue, de- 
fined by the amounts of secondary and teitiary structure at 
each residue" ISTl . Exceptions to such structural assumptions 
are a recent MD study of an ultrafast mini-protein in which 
<I>-values are calculated from rates for the wildtype and mu- 
tants via eq. (j2]) |52|, and the calculation of (f>-values from 
free-energy shifts of the transition-state ensemble using eq. Q 
El- 

In the following sections, we will consider statistical- 
mechanical models that lead to a novel structural interpreta- 
tion of mutational <I>-values. The general conclusion from 
these models is that a consistent structural interpretation of 
<I>-values (i) requires to split up mutation-induced stability 
changes into free-energy contributions from different sub- 
structural elements of a protein, and (ii) can be obtained with 
few parameters that characterize the degree of structure for- 
mation of cooperative substructures such as a-helices and f3- 
hairpins in the transition-state ensemble. 

III. FORMATION OF HELICES DURING PROTEIN 
FOLDING 

In this section, we present a simple model for the forma- 
tion of a-helices during protein folding. The model will lead 
to a consistent structural interpretation of the mutational data 
for the CI2 helix shown in table 1 and for other helices. In 
paiticular, the model reproduces the negative <I>-value for the 
mutation D23A in this helix, which cannot be understood in 
the traditional interpretation of <I>-values (see last section). 

The model has two main ingredients. First, the central as- 
sumption is that a helix, or a segment of a helix, is either 
fully formed or not formed in partially folded conformations, 
in particular in transition-state conformations. The transition 
state is desciibed as an ensemble of M different conforma- 
tions (see fig. [3]). Each transition-state conformation is di- 
rectly connected to the native state N and to the denatured 
state D. The model thus has M parallel folding and unfolding 
routes. 

Second, mutation-induced free-energy changes are split 
into two components. The overall stability change AGjv is 
split into the change in intrinsic helix stability AG a, and the 
free-energy change AGt of tertiary interactions caused by the 
mutation; 

AGn = AG„ + AGt (4) 
The intrinsic helix stability Ga is the stability of the 'isolated' 
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FIG. 3: In our model, the transition-state ensemble T consists of M 
transition-state conformations Ti, T2, . . ., Tm- The arrows indicate 
the folding direction from the denatured state D to the native state N 
via the transition-state conformations. 



helix, i.e. the free-energy difference between the folded and 
the unfolded state of the helix, in the absence of tertiary inter- 
actions with other structural elements. Similarly, we decom- 
pose each AGm, the mutation-induced free-energy change for 
the transition-state conformation m, into two terms: 



AGjy, — S,„AGr, + tjyiAGt 



(5) 



Here, Gm is the free-energy difference between transition- 
state conformation m and the denatured state. Because we 
assume cooperative formation of the helix, or helical segment, 
Sjn is either or 1, depending on whether the segment is 
formed or not in the transition-state conformation ni. The co- 
efficient tjn is between and 1 and represents the degree of 
tertiary structure formation in conformation m. 

We assume that the free-energy barrier for each transition- 
state conformation is significantly larger than the thermal en- 
ergy, i.e. that Gm/RT > 1 155 56|. The rate of folding along 
each route m is then proportional to exp[—G,n/ RT], and the 
total folding rate is the sum L33l 



M 
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m— 1 



Gm/RT 



(6) 



of the rates along the M parallel routes. Here, c is a constant 
prefactor. 

The folding rate for a mutant then is k' — fc(Gi + 
AGi, 6*2 + AG'2, . . . , Gm + AGm) with k given in eq. 
We assume here that the mutations do not affect the prefactor 
c in eq. (j6|. For small values |AG„i| of the mutation-induced 
free-energy changes, a Taylor expansion of In k' leads to 
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With the decomposition of the AG„j's in eq. (j5]l, we obtain 

1 



(7) 



In fc' — In fc : 
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(XaAGa + XtAGt) 



(8) 



with the two terms 
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(10) 



The term Xa is the Boltzmann-weighted average of the sec- 
ondary structure parameter s„i in the transition-state ensem- 
ble T. The value Xq = 1 indicates that the helix is formed in 
all transition-state conformations m, while Xa — indicates 
that the helix is formed in none of the transition-state confor- 
mations. Values of Xa between and 1 indicate that the helix 
is formed in some of the transition-state conformation, and 
not formed in others. The term xt represents the Boltzmann- 
weighted average of the tertiary structure parameter tm in T. 

From eq. ^ and the definition in eq. Q, we then obtain the 
general form [33| 



_ XaAGg + XtAGt _ AGa 
AGjv + aGn 



(11) 



of <i>-values for mutations in helices. The second expression 
simply results from replacing AGt by AGat — AGq, see 
eq. Q. 

The analysis of experimental <i>-values and stability 
changes AGat with eq. ( [TT] i requires an estimate of the 
mutation-induced changes AG^ of the intrinsic helix stability. 
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FIG. 4: Analysis of the mutational data for the a-helix of CI2 
shown in table 1. In agreement with eq. l |l we observe an ap- 
proximately linear relation between <& and AGa/AGjv with a Pear- 
son correlation coefficient of 0.91 |33|. From the regression line 
$ = 0.16 + 0.87AGa/AGjv, we obtain the structural parameters 
Xa = 1.03 ± 0.05 and Xt ~ 0.16 ± 0.05. The structural parameter 
Xa close to 1 indicates that the helix is fully formed in the transition 
state, while the parameter Xt indicates that tertiary interactions with 
the /3-sheet are on average formed to a degree around 16 %. The es- 
timated standard deviation of data points from the regression line is 
0.14 133J, which is comparable to the experimental errors I34II66I . 
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FIG. 5: Analysis of mutational data for helix 2 of protein A. The solid 
line represents the regression line $ — 0.46 + 0.52 AGq/AGjv. 
The Pearson correlation coefficient of the data points is 0.93, and the 
estimated standard deviation of the data points from the regression 
line is 0.10. From the regression line and eq. (|TTJ, we obtain the 
structural parameters Xa = 0.98 ± 0.05 and xt = 0.46 ± 0.05. 
Values of AG^ for the mutations have been estimated from a helix 
propensity scale [331 . 



For the mutations in the CI2 helix shown in table 1, we have 
calculated AGq, with the program AGADIR ll38l . In agree- 
ment with eq. ([TT}, we observe a linear relation between <i> 
and AGq/AGtv for the data shown in table 1, within reason- 
able errors (see fig. |4]|. The structural parameters Xa and xt 
can be estimated from the slope of the regression line, and the 
intersection of this line with the y-axis. For the CI2 helix, we 
obtain the values Xa = 1.03±0.05 and Xt = 0.16±0.05 ||331. 
which implies that the helix is fully formed in the transition 
state, while tertiary interactions with the /3-sheet are formed 
to an average degree of around 16 %. 

In this model, the different <I>-values for the mutations in the 
CI2 helix arise from different 'free-energy signatures' AGq 
and AGjv of the mutations. In particular, the model cap- 
tures the negative ^-value for the mutation D23A. Accord- 
ing to eq. ( [TT] i, negative (E>-values or (E>-values larger than 1 
can arise if the mutation-induced changes AGq and AGi = 
A Gat — AGq in secondary and tertiary free energy have oppo- 
site signs. We find that the mutation D23A stabilizes the helix 
(AGq < 0), but destabilizes tertiary interactions (AGt > 0). 

The model leads to a consistent structural interpretation of 
the mutational data for several helices |33|. Besides the CI2 
helix, another helix for which a large number of mutational 
values have been measured is helix 2 of the three-helix protein 
A. An analysis of the experimental data with eq. ([TT} leads to 
the structural parameters Xq = 0.98 ± 0.05 and xt = 0.46 ± 
0.05 (see fig.|5]). The value of Xq close to 1 indicates that the 
helix is fully formed in the transition state, and the value of xt 
close to 0.5 indicates that teitiary interactions with the other 
two helices of the protein are present to an average a degree 
of about 50 %. 



IV. FOLDING OF SMALL /3-SHEET PROTEINS 

In this section, we model mutational data for the folding 
dynamics of small /3-sheet proteins. The smallest /3-proteins 
have just three /3-strands. Important representatives of this 
class of proteins are WW domains (see fig. [6]l, named after 
two conserved tryptophan residues, which are represented by 
the letter W in the single-letter code for amino acids. WW 
domains are central model systems for understanding /3-sheet 
folding and stability ElESlElllMllSIl- 

The fastest three-stranded /3-proteins fold in microseconds 
and are, thus, good targets for MD folding simulations with 
atomistic models (see section|l]i. For a small, designed three- 
stranded /3-sheet protein, beta3s, the transition-state con- 
formations have been determined from extensive folding- 
unfolding MD simulations |62 1. The native structure of beta3s 
is similar to the structure of WW domains, with two /3-haipins 
forming an antiparallel three-stranded /3-sheet. By identify- 
ing clusters of structurally similar conformations that have the 
same probability to fold or unfold, Rao et al. Il62l obtained a 
transition-state ensemble for beta3s in which either hairpin 1 
or hairpin 2 is structured, while the other hairpin is unstruc- 
tured. The two /3-hairpins of beta3s thus appear to be cooper- 
ative substructures that are fully structured or unstructured in 
the transition state. 

In the statistical-mechanical model for three-stranded /3- 
sheet proteins considered here, we assume a beta3s-like 
transition-state ensemble for in which either haiipin 1 or hair- 
pin 2 are formed (see fig. |7]i. The model has two folding 
routes: On one of the routes, hairpin 1 forms before hairpin 2, 
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FIG. 6: (a) The native structure of the FBP WW domain consist of 
two /3-hairpins, which form a three-stranded /3-sheet 1811 . (b) Con- 
tact matrix of the FBP domain. A black dot at position (i, j) of the 
matrix indicates that the residues i and j are in contact. Two residues 
are defined here to be in contact if the distance between any of their 
non-hydrogen atoms is smaller than the cutoff distance 4 A. Con- 
tacts between nearest- and next-nearest neighboring residues are not 
considered (grey dots). The hairpins 1 and 2 of the WW domains 
correspond to clusters of contacts. The remaining contacts largely 
correspond to contacts of hydrophobic amino acids, the small hy- 
drophobic core of the protein 163 J . 
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FIG. 7: Simple energy landscape of the four-state model for three- 
stranded /3-sheet proteins. The four states are the denatured state 
D, the native state N, and two partially folded states hp 1 and hp 2 
in which one of the two hairpins is formed. Here, Gjv is the free- 
energy difference between the native state N and the denatured state 
D, which has the 'reference free energy' Gd ~ 0, and Gi and G2 
are the free-energy differences between the transition-state confor- 
mations and the denatured state. 



and on the other route, after hairpin 2. The energy landscape 
of this model can be characterized by three free-energy differ- 
ences: The free-energy difference Gjy of the native state and 
the free-energy differences Gi and G2 of the two transition- 
state conformations with respect to the denatured state (see 
fig.|7]). For large transition-state barriers Gi and G2, the fold- 
ing rate is |63 1 



-Gi/RT 



-G2/B.T 



(12) 



The folding rate k is the sum of the rates for the two folding 
routes. 

Mutations correspond to perturbations of the free-energy 
landscape. In this model, a mutation can be characterized by 
the free-energy changes AGi, AG'2, and AGm- The folding 
rate of the mutant then is k' = k{Gi + AGi, 6*2 + AG'2). 
For small perturbations AGi and AG2, a Taylor expansion of 
In k' to first order leads to 
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(15) 



The two parameters xi ™d X2 are the probabilities that con- 
formation 1 with hairpin 1 and conformation 2 with hairpin 
2 are populated in the transition-state ensemble. From the <I>- 
value definition (|2| and eq. (13 1, we obtain the general form 



XiAGi +X2AG2 



AG 



N 



(16) 



of <I>-values for mutations in three-stranded /3-sheet proteins. 

A detailed mutational analysis of the folding kinetics of 
the FBP WW domain shown in fig.|6]has been performed by 
Petrovich et al. II61I . In general, mutations can affect hair- 
pin 1, hairpin 2, or the small hydrophobic core of the protein. 



Interestingly, eq. (16 1 predicts that all mutations that affect, 
e.g., only hairpin 1 should have the same <i>-value xi since 
we have AG2 = and AGat — AGi for these mutations. 
This is indeed the case, except for one outlier (see fig.|8|. The 
<I>-values of the remaining nine mutations that affect only hair- 
pin 1 of the FBP domain are centered around the mean value 
0.81 (dashed line in fig.jHJ, mostly within experimental errors. 
The mean value of these nine <i>-values leads to the estimate 
Xi = 0.81 ± 0.06 ||63|. Similarly, the four ^-values for mu- 
tations that affect only hairpin 2 are centered around a mean 
value X2 ~ 0.30 ± 0.08 |63|. Within the statistical errors, 
these two estimates for xi and X2 sum up to 1, which is a 
consistency requirement of our model since the protein has 
to take one of the two possible routes to the native state (see 
fig.|7]i. The two parameters xi and X2 are the probabilities for 
the two routes. 

To include other mutations in the model, we have to es- 
timate the impact of these mutations on the stability of the 
different structural elements (hairpin 1, hairpin2, or the hy- 
drophobic core) they affect. We have used the program 
FOLD-X lEl |65| to calculate these stability changes |63|. 
The structural parameters xi and X2 then can be obtained 
from a least-square fit of eq. (I61 to the experimental data 
(see fig. |9]l, with a single fit parameter since Xi + X2 = 1- 
The structural information obtained from this fit is that the 
transition-state ensemble of the FBP WW domain consists to 
roughly | of conformation 1 with hairpin 1 formed, and to j 
of conformation 2 with hairpin 2 formed. 

In this model, the magnitude of a <I>-value depends on which 
structural elements are affected, and on the mutation-induced 
free-energy changes of these elements. As in the previous sec- 
tion, negative <i>-values or <i>-values larger than 1 can arise if a 
mutation has both stabilizing and destabilizing effects on dif- 
ferent structural elements. For example, the model reproduces 




FIG. 8: "I>-values for mutations that only affect haipin 1 of the FBP 
WW domain [61]. Except for one outlier (open circle for mutation 
T9A), the <l>-values are centered around the mean value 0.81 ± 0.06, 
with deviations mostly within the experimental errors. 
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FIG. 9: Experimental versus theoretical "l>-values for the FBP WW 
domain. The theoretical $-values have been obtained from a least- 
square fit of eq. (jT6j with the single fit parameter xi- From this 
fit, we obtain the values xi = 0.77 ± 0.05 and Xa = 1 ^ Xi = 
0.23±0.05 for the fractions of the two transition-state conformations 
in which either hairpin 1 or hairpin 2 are formed. The Pearson cor- 
relation coefficient between theoretical and experimental $-values is 
0.90 if the outlier data point for mutation T9A (open circle) is not 
considered, and 0.77 if the outlier is included 1631 . 



the negative <i>-value —0.30 for a mutation of the FBP WW 
domain that stabiHzes hairpin 2 but destabihzes the hydropho- 
bic core (see fig. [9|, according to calculations with the pro- 
gram FOLD-X. The model also leads to a consistent interpre- 
tation of <I>-values for the PIN WW domain fST] |59l with the 
structural parameters xi = 0.67 ± 0.05 and X2 = 0.33 ± 0.05 

ill. 

The deviations between experimental and theoretical <&- 
values in fig. |9] are mostly within reasonable errors. It has 
been recently suggested that experimental errors for <i>-values 
may be underestimated since it is usually assumed that the 
errors in the measured free-energy changes of the transition 
state and the folded state are independent, which is not the 
case |66| (see also refs. lED |67l |68] ISU CHI f or a discus- 
sion on experimental errors of $-vaIue measurements). Other 
sources of errors are the simplifying modeling assumptions 
on the transition-state structure, and the calculations of the 
mutation-induced free-energy changes. 

In a related approach, Zarrine-Afsar et al. [71 1 have found 
that the folding rate changes for different mutations of the 
same residue in the /3-sheet of the Fyn SH3 domain corre- 
late with changes in /3-sheet propensity, a simple measure for 
mutation-induced free-energy changes in the /3-sheet. More 
recently, Farber and Mittermaier |72| have modeled the ef- 
fects of different mutations of hydrophobic core residues with 
two structural parameters for hydrophobic burial and native- 
like interactions. 

V. DISCUSSION AND CONCLUSIONS 

We have considered the question how transition states of 
two-state protein folding can be reconstructed from muta- 



tional data for the folding dynamics. In the traditional in- 
terpretation of the mutational data, the structural parameters 
are the degrees of structure formation of each residue of the 
protein in the transition state. The number of structural pa- 
rameters thus is identical with the number of residues. In this 
interpretation, the <I>-vaIues for mutations of a given residue 
are taken to be identical with the residue's degree of structure 
formation in the transition state (see section |ll]i, which can 
lead to inconsistencies: The traditional interpretation cannot 
capture different <f>-values for different mutations of the same 
residue, and 'non-classical' <i>-values smaller than or larger 
than 1. 

In sections [lIlland[TVl we have considered a different struc- 
tural interpretation of <i>-values for mutations in a-helices 
and small /3-sheet proteins. This novel interpretation implies 
just two structural parameters per helix, the degrees of sec- 
ondary and tertiary structure of the helix in the transition state, 
and a single structural fitting parameter for three-stranded 
proteins, the relative degree of structure formation of hairpin 
1 and haiipin 2 in the transition state. Inconsistencies of the 
traditional interpretation are resolved by splitting mutation- 
induced free-energy changes into secondary and tertiary com- 
ponents. In particular, two negative (f>-values for a mutation 
in the CI2 helix and a mutation in the FBP WW domain are 
traced back to free-energy changes of opposite sign, without 
additional assumptions. The mutations stabilize the CI2 he- 
lix and hairpin 2 of the FBP WW domain, respectively, but 
destabilize tertiary interactions with other structural elements 
of the proteins. Other groups have suggested that negative 
values may arise from non-native interactions in the transition 
state |73 1, parallel folding routes with energetic traps [74|, ex- 
perimental errors ll68l . or from mutation-induced free-energy 
changes of the denatured state f75l. An extension of the 
novel interpretation to larger /3-sheet proteins than the three- 
stranded WW domains considered here requires the identifi- 
cation of cooperative substructural elements. Candidates for 
such cooperative elements are /3-hairpins or other /3-strand 
pairings 1761 . 

Future MD folding simulations with detailed atomistic 
models may lead to a more complete understanding of protein 
folding transition states and mutational effects on the fold- 
ing dynamics. Challenging goals are the characterization of 
transition-state conformations on folding or unfolding trajec- 
tories 1 62 1 and the direct determination of <i>-values from fold- 
ing simulations with mutants II52I . 
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